-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update some methods to accept Path input #1681
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@takutosato Looks good overall. I found a few issues that need to be resolved before we can merge it. (including a bug I seem to have added years ago an no one has noticed I guess).
I'm also not really sure we should deprecate the file methods. In general we've just been keeping them along side the existing ones. Maybe it's time to start flipping them to deprecated, maybe people will complain. I'm unsure...
*/ | ||
@Deprecated | ||
public static void reheaderBamFile(final SAMFileHeader samFileHeader, final File inputFile, final File outputFile, final boolean createMd5, final boolean createIndex) { | ||
reheaderBamFile(samFileHeader, inputFile.toPath(), outputFile.toPath(), createMd5, createIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use IOUtil.toPath()
instead of file.toPath()
in these cases because it propagates the null in the case of a null input instead of NPE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
final long skipLast = ((term == BlockCompressedInputStream.FileTermination.HAS_TERMINATOR_BLOCK) && skipTerminator) ? | ||
BlockCompressedStreamConstants.EMPTY_GZIP_BLOCK.length : 0; | ||
final long bytesToWrite = length - skipLast - currentPos; | ||
|
||
IOUtil.transferByStream(in, outputStream, bytesToWrite); | ||
IOUtil.transferByStream(in, outputStream, bytesToWrite); // tsato: this method is manually buffered so make sure BufferedReader/Writer is not used (or does that matter?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double buffering is a pretty minor issue in most cases. If it wasn't dealing with it before I'm not worried about it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
IOUtil.assertFileIsReadable(inputFile); | ||
IOUtil.assertFileIsWritable(outputFile); | ||
// IOUtil.assertFileIsWritable(outputFile); // tsato: what do I do with this... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question.. We could presumably adapt that too take a path but maybe that's complicated. There is an existing method assertDirectoryIsWriteable that you could run on the parent of the outputPath (with a null check added to make sure output file isn't null...)
I'm really not a huge fan of these preemptive checks. The good thing is that they produce sane error messages. The bad thing is that they can potentially fail in misleading ways (i.e. transient internet events, weird implementation of directories by google. ), and they have non-negligible performance costs in some cases.
That's a long way to say that it's probably fine to drop it, although I might add a null check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Added the null check. Another possibility (which I started to adopt in Picard) is to check writability if the Path refers to a local file e.g.
ValidationUtils.nonNull(inputFile);
ValidationUtils.nonNull(outputFile);
IOUtil.assertFileIsReadable(inputFile);
if (outputFile.toUri().getScheme().equals("file")){
IOUtil.assertFileIsWritable(outputFile.toFile());
}
Is this worth doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that solution. Lets do that for now but wrap it into a utility method so you don't have to keep rewriting that logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Done
final long length = inputFile.length(); | ||
final long currentPos = in.getCount(); // tsato: assuming currentPos is the position after writing the first block, this should be ok. | ||
// final long length = inputFile.length(); | ||
final long length = Files.size(inputFile); // tsato: rename to size | ||
final long skipLast = ((term == BlockCompressedInputStream.FileTermination.HAS_TERMINATOR_BLOCK) && skipTerminator) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think length is still perfectly clear.
final long vOffsetOfFirstRecord = SAMUtils.findVirtualOffsetOfFirstRecordInBam(inputFile); | ||
final BlockCompressedInputStream blockIn = new BlockCompressedInputStream(inputFile); | ||
blockIn.seek(vOffsetOfFirstRecord); | ||
if (skipHeader) { // tsato: this method assumes bam file...should I test cram? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry about cram here, this method only works for bam files.
|
||
@Test | ||
public void testReheaderBamFile(){ | ||
final File originalBam = new File(this.NA12878_8000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually don't use this
when referring to a static variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// Confirm that the reads are the same as the original | ||
Assert.assertTrue(compareBamReads(originalBam, output, DEFAULT_NUM_RECORDS_TO_COMPARE)); | ||
} catch (IOException e){ | ||
throw new HtsjdkException("Could not create a temporary output file.", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just add throws IOException to the test method. It's not clear that this error message will be correct, there are a bunch of IO operations going on during this method so it could be more confusing than helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public void testBlockCopyBamFile(final boolean skipHeader, final boolean skipTerminator) throws IOException { | ||
System.out.println(skipHeader + ", " + skipTerminator); | ||
final File output = File.createTempFile("output", ".bam"); | ||
final OutputStream out = Files.newOutputStream(output.toPath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try-with-resourcces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return true; | ||
} | ||
|
||
// tsato: kinda ugly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, seems fine
* | ||
* @return true if the first (numRecordsToCompare) reads are equal, else false | ||
*/ | ||
private boolean compareBamReads(final File bam1, final File bam2, final int numRecordsToCompare){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is subtly wrong. If either iterarator is too short it will fail with an exception instead of returning false. Instead of reimplementing this, you should just use Assert.assertEquals()
which includes an overload for comparing two iterables.
It might make sense to keep this method and have it handle opening/closing the SamReaders but renaming it to assertBamRecordsEqual and use the existing comparators.
@lbergelson could you take a look at my revised code? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks google to me!
} | ||
|
||
final long pos = BlockCompressedFilePointerUtil.getBlockAddress(blockIn.getFilePointer()); | ||
blockIn.close(); // tsato: why doesn't IntelliJ say this is unnecessary? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could leave out that close. I don't know why intellij isn't telling you that but it looks like it's redundant to me.
Description
This PR makes changes needed for supporting cloud input/output files in some Picard tools. (see broadinstitute/picard#1804)
Things to think about before submitting: